Automatically Creating Biomedical Bibliographic Records from Printed Volumes of Old Indexes

نویسنده

  • Daniel X. Le
چکیده

To provide online access to citations from old hardcopy indexes published from 1879 through 1965, an R&D division of the National Library of Medicine (NLM) is developing an automated system to convert bibliographic information in volumes of the printed Quarterly Cumulative Index Medicus (QCIM) to machine-readable form for inclusion in the OLDMEDLINE® database. The system processes images scanned from a QCIM volume, segments and labels the image records, identifies multiple occurrences of the same record in the volume, and creates unique citation records. The record segmentation and labeling technology is based on a smearing bottom-up approach for text block segmentation, the document page layout formats, and a set of rules for record labeling that is derived from the QCIM document format guideline. Since bibliographic information can be arranged as both “author entries” and “subject entries” in a QCIM document, the duplicate records have to be detected and combined to create a single unique citation. The duplicate records are identified based on matching “cross-reference” information such as author names, journal title abbreviation, volume, pagination, month, and year among different entries of the same citation. The “cross-reference” information can also be used to correct OCR errors resulting in improving the quality of citations created. The performance of the system has been evaluated using a QCIM volume published in 1929 that consists of 95,717 citation records. Evaluation shows the technical and cost feasibility of building the proposed data conversion system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

الگوی ملزومات کارکردی پیشینه‌های کتابشناختی: شیوه‌ای نوین در تنظیم عناصر کتابشناختی

Functional Requirements for Bibliographic Records (FRBR) is a conceptual model for the arrangement of bibliographic records in catalogs and databases which was proposed in IFLA 1997, following a plan for revising Anglo-American Cataloging Rules (AACR). This model is inclined to be separated from the other cataloging rules, and uses a new structure for storing and displaying bibliographic record...

متن کامل

Historical Author Affiliations Assist Verification of Automatically Generated MEDLINE® Citations

High OCR error rates encountered in author affiliations increase the manual labor needed to verify MEDLINE citations automatically created from scanned journal articles. This is due to poor OCR recognition of the small text and italics frequently used in printed affiliations. Using author-affiliation relationships found in existing MEDLINE records, the SeekAffiliation (SA) program automatically...

متن کامل

Text Verification in an Automated System for the Extraction of Bibliographic Data

An essential stage in any text extraction system is the manual verification of the printed material converted by OCR. This proves to be the most labor-intensive step in the process. In a system built and deployed at the National Library of Medicine to automatically extract bibliographic data from scanned biomedical journals, alternative means were considered to validate the text. This paper des...

متن کامل

A Feasibility Study of Resource Description and Access (RDA) Implementation in Manuscripts’ Bibliographic Records in Iran

This study was conducted to investigate Feasibility of Resource Description and Access (RDA) implementation in manuscripts’ bibliographic records.Paper type: This research is a practical (applicable research)The present research is based on the Research and Development based on documentary and the comparative approach. Findings: The findings prove that out of the identified el...

متن کامل

Retrospective Conversion of Old Bibliographic Catalogues

This paper describes a framework for retrospective document conversion in the library domain. Drawing on the experience and insight gained from the more project launched over the present decade by the European Commission, it outlines the requirements for solving the problem of retroconversion of old catalogues in unimarc format. Based on ocr technique and automatic structure recognition, the sy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005